Interactive voice response system

ABSTRACT

A voice response system and method for navigating any network and using facilities and applications provided by various destination nodes within the network. No change is required in the applications provided by the destination nodes. A user can control and navigate the system with no prior knowledge of the system via self-discovery facilities provided as part of a learning system that adapts itself to the user.

[0001] This is a Continuation of International ApplicationPCT/US01/00376, with an international filing date of Jan. 4, 2001, whichclaims the priority of U.S. Provisional Application No. 60/174,371 filedJan. 4, 2000.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to voice-based interactive userinterfaces, particularly to interactive voice response systems, and moreparticularly to interactive voice response systems for accessinginformation from a computer network via remote telephony devices.

[0004] 2. Description of Related Art

[0005] Voice mail and other interactive voice response (IVR) systemsallow a user to access audio information stored in a computer memorysuch as a hard disk. Typically, the audio information is stored in audiofiles created either by the user or for the user. Conventional IVRsystems use dual-tone multi-frequency (DTMF) signalling to allow theuser to interact with the server through a standard telephone keypad.Pre-recorded audio information is available on IVR systems in the formof instructional phrases such as “Please type in your account numberfollowed by the pound sign.”

[0006] Pre-recorded audio is also used for introductory phrases such as“Your account balance is . . . ” At this point, the IVR computer mayaccess a connected database that stores the requested account balance innumerical format, convert the numerical format to an audio format usinga numerical text-to-speech engine, and state the account balance. Thisconversion from numerical format to audio format is extremely rigid andcompletely predefined. IVR systems are “closed” in that each IVR systemis uniquely designed, not connected to a computer network, and IVRsystems cannot be used interchangeably. Also, these IVR systems aredesigned specifically for audio interaction.

[0007] In contrast, audio/visual information on an audio/visual serverin a computer network may be accessed using a personal computer. Forexample, a World Wide Web (Web) page on the Internet may be accessedusing a computer linked through an Internet access provider, such asAmerica On Line™. or Prodigy™, to a Web server.

[0008] The Internet has emerged as a mass communications, commerce andentertainment medium. Worldwide, people are enabled to interact,distribute and collect information, create community with individualssharing similar interests and make purchases electronically. Accordingto International Data Corporation (“IDC”), worldwide e-commerce totaledapproximately $32 billion in 1998 and is expected to total over $425billion in 2002. IDC also projects that worldwide Internet use will growfrom approximately 142 million users in 1998 to 502 million users in2003. In light of the proliferation of Internet usage, ForresterResearch projects that global online advertising spending will reach $33billion by 2004, while online advertising in the U.S. will grow from$2.8 billion in 1999 to $22 billion in 2004.

[0009] The growth of the Internet over the past five years has beennothing short of spectacular, particularly in the U.S. Thisproliferation however, is largely confined to westernized countries.Recent studies by Commerce Net and the Stanford Institute for theQuantitative Study of Society have yielded some startling results:

[0010] 92% of the world's population has no access to the Internet

[0011] 90% of the U.S. population also has no access to the Internet atleast half of the time

[0012] People are more mobile than ever before

[0013] Cell phone penetration is rapidly increasing

[0014] A quarter of the U.S. population is apprehensive about orexperiences difficulty using computers and the Internet

[0015] Further, in certain situations, however, use of a computer maynot be feasible or access to a computer may not be possible. Forexample, a cellular telephone user driving an automobile may want toknow about traffic in the surrounding area, however, the user cannotoperate a computer while in the car. In situations such as this, anaudio interface may be useful for obtaining information from theInternet or another computer network.

[0016] Other situations where an audio interface to a computer networkmay be useful include accessing an electronic calendar on a local areanetwork (LAN) to receive or modify an itinerary, accessing E-mail on theInternet or a wide-area network (WAN) while away from a computer, andrequesting a telephone number from an electronic yellow pages or whitepages while at a pay phone. An audio interface to the Web could also beused to traverse the Internet and obtain information residing on variousWeb servers.

[0017] The telecommunications industry has experienced strong growthover the last decade. Despite its growth, the highly fragmentedtelecommunications industry is being changed by the emergence of theInternet as a global medium for communication, news, information andcommerce. Substantial portions of the commerce and advertising marketsremain uncaptured. The proliferation of Internet, cellular andtelecommunications users, combined with the global reach and lower costof distribution in such arenas, have created a powerful channel fordelivering entertainment and information and conducting relatedadvertising and commerce.

[0018] It is interesting to note that each area code enables nearly 8million separate telephone numbers and the total number of area codes inservice has nearly doubled since 1991, growing from 119 to 215,according to the FCC. In California alone, the California PublicUtilities Commission expects the number of area codes in service toincrease from 13 in January 1997, to 40 by 2002. A significant portionof this growth is due to the rapid proliferation of cellular and PCStelephone service. The number of U.S. wireless subscribers is expectedto grow to 149 million in 2003, representing a wireless marketpenetration of 53%. The global wireless penetration is expected toincrease from 425 million in 1999 to 953 million in 2003.

[0019] U.S. Pat. No. 5,884,262 discloses a computer document audioaccess and conversion system that allows a user to access informationoriginally formatted for audio/visual interfacing on a computer networkvia a simple telephone. Of course, files formatted specifically foraudio interfacing can also be accessed by the system. A user can call adesignated telephone number and request a file via dual-tonemulti-frequency (DTMF) signaling or through voice commands. The systemanalyzes the request and accesses a predetermined document. The documentmay be in a standard document file format, such as hyper-text mark-uplanguage (HTML) which is used on the World Wide Web. The document isanalyzed by the system, and depending on the different types of formatsused in the document, information is translated from an audio/visualformat to an audio format and played to the user via the telephoneinterface. The document may contain links to other documents that can beinvoked to access such other documents. In addition, the system can havea native command capability that allows the system to act independentlyof the accessed document contents to replay a document or carry outfunctions similar to those available in conventional web browsers.

[0020] The system disclosed in U.S. Pat. No. 5,884,262 is limited tohandling information originally formatted for audio/visual interfacingto a computer network via a telephone. There is a need for flexibleinteractive access to information that is not originally formatted foraudio interfacing to a computer network via telephony devices. There isa need for interactive telephony access to a computer network, such asthe Internet, to expand and enrich usage with unique and compellingcontent and products.

SUMMARY OF THE INVENTION

[0021] The present invention is directed to an interactive voiceresponse system that permits users to access information that is notoriginally formatted for audio interfacing to an information exchangenetwork, such as a computer network. Users spoken utterance is analyzedand matched with an index of destinations. A list of valid destinationsis produced and the user is the guided along the path with pre-recordedvoice prompts. The user accessing the system can control the navigationvia more speech and/or telephone keypad entry. The intent of the systemis to be able to come up with a single choice destination amongst themany offered within the system.

[0022] The decision to choose a valid destination is driven by a varietyof factors

[0023] User preferences

[0024] User profile derived from usage pattern history

[0025] User responses

[0026] Advertiser rules

[0027] Utterance match weightage

[0028] Active context

[0029] Call origin

[0030] Call date/time

[0031] Call length

[0032] The destination that is derived earlier is then accessed viaspoken utterance and/or telephone keypad entry. User specificinformation about the destination is derived from the user profile andthe current call context and is used to offer access to the facilitiesoffered by the destination. The facilities offered are specific to theapplication provided by the destination node.

[0033] User responses and queries are appropriately translated to thedestination format and vice versa. All of the interaction is viaconcatenated pre-recorded or synthesized voice segments or fragments.

[0034] The inventive voice response system includes a number of novelfunctional and logical components, including without limitations queryengine, ad generator, web parser, profiler and replication engine,managed by a manager. These components may physical reside in the sameor different servers.

[0035] The present invention will be described in reference to“HeyAnita”, and in the alternate “Anita”, which references relates tothe commercial system launched by HeyAnita, Inc. (www.heyanita.com).

[0036] HeyAnita Inc.'s proposed solution is to enable the world'spopulation to access, by voice, the wealth of information andapplications available on the Internet, using any type of phone—rotary,touchtone or wireless. The rationale behind this vision is threefold:

[0037] 1. Everyone knows how to use a telephone.

[0038] 2. Most cities in the world already have reliable land-linephones as well as wireless infrastructure.

[0039] 3. The easiest user interface is the speaker's natural language,both spoken and heard.

[0040] As competition within Internet and cellular usage intensifies,high traffic Internet portals, other e-commerce providers andtraditional companies will continue to seek ways to expand and enrichtheir consumer offerings with unique and compelling content andproducts. This will create significant opportunities for HeyAnita toconnect eyeballs to eardrums, thereby enabling these companies to targetand reach a significantly expanded audience.

BRIEF DESCRIPTION OF THE DRAWINGS

[0041]FIG. 1 is a schematic representation of the Anita ServerArchitecture.

[0042]FIG. 2 is a schematic representation of the logical internalstructure of Anita Server.

[0043]FIG. 3 is a schematic representation of the overall HeyAnitaglobal infrastructure that comprises Anita Servers in various countries,cities, and other locales.

[0044]FIG. 4 illustrates one embodiment of a “tree” structure thatexemplifies how clarification questions would be asked while narrowingdown a search.

[0045]FIG. 5 is a schematic representation of the HeyAnita OperatingSystem.

DETAIL DESCRIPTION OF THE INVENTION

[0046] The present description is of the best presently contemplatedmode of carrying out the invention. This description is made for thepurpose of illustrating the general principles of the invention andshould not be taken in a limiting sense. The scope of the invention isbest determined by reference to the appended claims.

[0047] The present invention will be described below in reference to theInternet as an example of an information exchange network. The presentinvention is applicable to other types of information network withoutdeparting from the scope and spirit of the present invention.

[0048] The HeyAnita Solution

[0049] HeyAnita enables individuals to surf the Internet from any phone,anywhere, anytime simply by using their voice. By utilizing itsrevolutionary HeyAnita operating system (“HeyAnita OS”) technology andeasy to use interface, HeyAnita establishes a comprehensive VoiceInternet Portal (“VIP”), providing a voice interface to the Internet andallowing Internet and telephone users to access volumes of information,headline news, stock quotes, horoscopes, auctions, food deliveryservices, weather forecasts, sports scores, travel, shipping status,free integrated voice mail, and much more. In addition, HeyAnita enablese-commerce providers to add voice application (v-application) servicesto their existing platform and enables traditional corporations toefficiently compete in the digital arena. HeyAnita's unique solutionincreases traffic and commerce by providing access to individuals who donot use traditional Web-based browsers and also allows traditionalInternet users access from locations lacking connectivity.

[0050] HeyAnita uses its proprietary technology and easy to useinterface to create an informative and entertaining environment toattract and retain a large and loyal user base. In addition to itseasily brandable name and concept, HeyAnita offers the mostcomprehensive array of voice enabled services and allows phone users toaccess the Internet in multiple languages. Appendix B sets forth some ofthe application features possible with the inventive HeyAnita system.

[0051] Architecture

[0052] HeyAnita Voice Platform is a set of components based on MicrosoftWindows DNA architecture that allows developers and power-users torapidly develop and deploy speech applications. The platform is an openenvironment that encapsulates a speech recognition engine, audio inputsources (speaker, telephone) and audio output sources (speaker,telephone). It provides a vendor independent interface to the voiceapplication by providing a consistent interface to the various audiodevices and the speech recognition engine.

[0053] Any application written to these interfaces can be ported fromone device to another or from one speech recognition vendor to anothermerely by creating the appropriate object. For example, developers candevelop and test their voice applications using a PC speaker and amicrophone and then move the application to the telephone just bycreating objects that support the telephone device.

[0054] The primary design considerations, features and functionalitiesfor the HeyAnita Voice Platform are:

[0055] Device Transparency: HeyAnita Voice Platform is not tied to anyhardware device. It provides plug-and-play flexibility to switch theunderlying hardware without having to modify the actual application.Because of this, developers do not need any special hardware to writeand test their applications. They will be able to write theirapplications on standard Microsoft Windows PCs and deploy them on anytelephony platform.

[0056] Speech Recognition Engine Transparency: HeyAnita Voice Platformis not tied to any specific speech recognition engine. It providesplug-and-play flexibility to switch the underlying speech recognitionengine without having to modify the actual application. Developers willbe able to develop applications on any shareware speech recognitionengine and later deploy them on any of the popular commercial speechrecognition engines such as Speechworks or Nuance.

[0057] Language of Choice: HeyAnita Voice Platform does not forcedevelopers to learn a new language such as VXML. In addition to W3CVXML, HeyAnita Voice Platform allows developers to write applications ina language of their choice. For instance, any COMcompliant language suchas Visual Basic, Visual C++ or Java can be used to develop applicationson the HeyAnita Voice Platform.

[0058] Rich VUI: HeyAnita Voice Platform's open architecture allowsdevelopers to plug in third-party components to make their Voice UserInterfaces richer. Developers do not have to settle for mediocre VoiceInterfaces because of the limitations in the platform or language.

[0059] Location Transparency: HeyAnita Voice Platform allows developersto host their applications on any server on the Internet. All the piecesof HeyAnita Voice Platform are developed with location transparency inmind.

[0060] Multiple Language Support: HeyAnita Voice Platform has beendesigned to support international languages. Any application written onHeyAnita Voice Platform can be localized in any international languagewithout any code changes.

[0061] HeyAnita Voice Platform/HeyAnita OS:

[0062] HeyAnita OS is a multi-threaded surrogate process that hosts allthe HeyAnita components and application objects. It takes care of allthe thread management and monitoring, administration so thatapplications writers do not have to worry about issues such as threadsynchronizations. FIG. 5 shows the components of the HeyAnita OS (100).

[0063] HeyAnita Speech Objects (110):

[0064] These are a set of COM+components that encapsulate hardwaredevices and speech recognition engines. Once the applications arewritten using these interfaces, they can be ported easily from onehardware device to another or from one recognition engine to another bysimply replacing the corresponding HeyAnita Speech Object.

[0065] Speech Recognition Manager (SR)—This object encapsulates thespeech recognition engine and the text to speech engines and provides aconsistent interface to these engines in a vendor independent fashion.

[0066] Audio Source (AI)—This object encapsulates the audio input deviceand provides a consistent interface in a device independent fashion.

[0067] Audio Destination (AO)—This object encapsulates the audio outputdevice and provides a consistent interface in a device independentfashion.

[0068] Grammar Object (GO)—This object provides a consistent interfaceto provide grammar files for speech recognition. The grammar files canreside anywhere on the Internet. The grammar object refers to thegrammars files by URI.

[0069] Prompt Object (PO)—This object provides a consistent interface toprovide prompts in speech applications. The prompts can reside anywhereon the Internet. The prompt object refers to the prompt files by URI.

[0070] A typical voice application will create a SR object for speechrecognition, an Al object as an audio input object, an AO object as anaudio output, a GO object for recognizing speech and several PO objectsfor the various prompts it may require. The application can then playthe prompts using the audio out object, accept input using the audio inobject and recognize the input using the speech recognition object whilethe grammar object gives context to the speech recognition object.

[0071] HeyAnita Agent (116):

[0072] HeyAnita Agent is a set of COM+ objects that allow speechapplications to access data in a consistent manner. This makes speechapplications transparent to the underlying data format. Applicationsaccess data in any OLE DB-compliant database, XML page, HTML page or WAPpage using the same programming model.

[0073] Speech Applications (114):

[0074] Speech applications are written as a set of COM+ components orVXML files. These applications can be written in any COM-compliantlanguage such as Visual Basic, Visual C++ or Java. It is also possibleto write an application using multiple languages, e.g., it is possibleto make use of a VXML file inside a Visual Basic speech application.This flexibility allows developers to write voice applications fasterand in the language they are most comfortable with.

[0075] Applications written to HeyAnita speech platforms don't have toreside on the same server that the platform resides. TheseCOM+components can be installed locally on the telephony server or anyremote machine. In fact these applications can reside anywhere on theInternet. Applications on the Internet communicate with the platformusing SOAP.

[0076] HeyAnita Tools/Wizards (118):

[0077] HeyAnita tools are a set of design time controls (DTCs) thatallows the developers to quickly generate Speech Applications in adrag-and-drop fashion. Developers do not have to learn a new languagesuch as VXML. All the code is generated by these design time controls.These tools are provided for all components included in the HeyAnitaframework. In addition to the DTCs, add-ins are provided for Office tofacilitate easy authoring of content.

[0078] Many components from the HeyAnita framework have associatedmetadata and data elements. Tools are provided for easy management ofthis content. Application wizards are provided for popular functions,such as a “shopping cart”, “get a stock quote” etc. In addition, sincethe HeyAnita wizard model is a Visual Studio DTC, developers can createtheir own wizards or extend existing ones.

[0079] HeyAnita Framework (112):

[0080] HeyAnita framework provides a number of plug-and-playCOM+components to facilitate rapid development and deployment of voiceapplications. Using these components as building blocks and writing justthe code to glue them together, programmers can create voiceapplications in a matter of hours. All the necessary voice userinterface, grammars and functionality are implemented by thesecomponents. All the components contain the necessary audio prompts andgrammars. Developers, however, have the ability to override these bycustoming their prompts or grammars.

[0081] This is an extensible, open framework. It allows developers toadd new value-added components to this framework by simply exposing aset of published COM+interfaces. Most of the HeyAnita portalapplications are built using this framework.

[0082] Depending on the functionality, these components fall into one ofthe following categories:

[0083] Basic Components: These are basic building blocks forconstructing a voice application. When developers use these components,they automatically get consistent and easy-to-use voice interfacesacross all their applications.

[0084] Data-bound components: These components implement standardizedvoice interface on top of commonly used data elements.

[0085] Value-added components: Value-added components provide all thebells and whistles for making voice user interface entertaining andfun-to-use.

[0086] Basic Components:

[0087] The HeyAnita framework may include the following basiccomponents:

[0088] 1. Sentence: Plays back a set of sentences.

[0089] 2. Input: Gets voice command input from the user.

[0090] 3. Menu: Implements smart voice menu.

[0091] 4. Number: Plays back a number.

[0092] 5. Currency: Plays back currency.

[0093] 6. Date: Plays back date.

[0094] 7. Time: Plays back time.

[0095] 8. Credit Card: Gets credit card information from the user.

[0096] 9. Social Security Number: Gets social security number from theuser.

[0097] 10. Name: Gets name information from the user.

[0098] 11. Address: Gets address information from the user.

[0099] 12. VXML Parser: Parses and executes a W3C compatible VXMLstream.

[0100] Data-bound Components:

[0101] The HeyAnita framework may include the following data-boundcomponents:

[0102] 1. Stock Quote: Retrieves individual stock quotes.

[0103] 2. Portfolio: Retrieves quotes for all the stocks in theportfolio. Also, allows the users to manage their portfolios.

[0104] 3. Weather: Retrieves weather information

[0105] 4. Movie Show Times: Retrieves movie show times

[0106] 5. Movie Previews: Retrieves movie previews

[0107] 6. Store/Service Locator: Locates a store or a service

[0108] 7. Status Inquiry: Checks status of an order, shipment

[0109] 8. Yellow Pages: Yellow page inquires

[0110] Developers will be able to bind these to any OLE DB provider orXML repository to retrieve the necessary data.

[0111] Value-Added Components:

[0112] The HeyAnita framework may include the following value-addedcomponents:

[0113] 1. AdMixer: Selects advertisements based on the user'spreferences and history.

[0114] 2. Randomize: Randomizes selection of audio prompts (from apre-defined set).

[0115] 3. Joke-of-the-day: Selects a joke of the day.

[0116] 4. Login: Allows users to login.

[0117] 5. Registration: Allows users to register.

[0118] 6. Debug: Adds debugging trace to the voice application.

[0119] Notifications/Alerts: Sends outbound notifications/alerts.

[0120] Anita Server

[0121] One of the primary components of the HeyAnita system is the AnitaServer 120 (FIG. 1) that implements the HeyAnita Voice Platform, whichconsists of several components to implement the following functionalityand features:

[0122] 1. Wait for an incoming call

[0123] 2. When a call is received, listen to user's voice as commandsand/or free-form speech or telephone keypad entry

[0124] 3. Decompose spoken utterance into proprietary commands usingproprietary wordmapping techniques and voice recognition grammar

[0125] 4. Ask relevant questions in order to determine user preferencesand context

[0126] 5. Identify the destination using proprietary search algorithmswithin the destination tree

[0127] 6. Navigate to the destination and retrieve requested information

[0128] 7. Translate retrieved information into voice prompts

[0129] 8. Generate commercials based on user preferences, usage historypatterns and context

[0130] 9. Intermix commercials and information in a seamless manner togenerate unique entertaining experience for the user

[0131] 10. Return information back to the user in the form ofconcatenated speech fragments and/or synthesized voice

[0132] Anita Server—Architecture

[0133]FIG. 1 is a schematic representation of the Anita ServerArchitecture. The Anita Server 120 is a fault tolerant, scaleable,remotely manageable, multi-threaded NT Service. This comprises thefollowing components:

[0134] a. Anita Telephone Interface (1)

[0135] Implements call management features such as ring and hangupdetection, call switch-over, call transfer, call waiting and tromboning.This also implements functionality to transform computer audio files(.wav files) to audio streams that can be played on a telephone 15 andto detect user utterances on the phone line to pass them on to the AnitaSpeech Recognition Engine. This may be implemented using Dialogic systemsoftware version DNA 3.2 and Nuance Speech recognition system version6.2.

[0136] b. Anita Speech Recognition Engine (2)

[0137] Translates spoken utterances to a set of text phrases. Thisengine supports a number of languages and is speaker independent. Thismay be implemented using Nuance Speech recognition system version 6.2.This engine serves as input to the Anita Natural Language Engine,described below.

[0138] c. Anita Natural Language Engine (3)

[0139] Converts natural language sentences to a set of structuredcommands. These structured commands are then used to drive Anita QueryEngine. The Anita Natural Language Engine in conjunction with AnitaQuery Engine identify destination nodes and the applications that areavailable to the user. This engines serves as input to the Anita QueryEngine, described below.

[0140] d. Anita Query Engine (4)

[0141] Maps commands to an application defined using the HeyAnita SpeechObjects 110 and Speech Applications 114, or HeyAnita function library(see example in Appendix A) and state machine definition language. Anexample of an application would be to obtain weather information usingYahoo! Web site. This would provide a user of the system the capabilityof listening to weather information for a set of cities or zip codes.The Anita Query Engine does the following:

[0142] 1) Play voice prompts for the user to exactly identify anapplication

[0143] 2) Generate web URLs to initiate execution of the selectedapplication

[0144] 3) Hand over control to the Anita State Machine and Web Parser,described below

[0145] e. Anita State Machine and Web Parser (8)

[0146] Anita State Machine and Web Parser executes state machineswritten using a proprietary function library. This retrieves informationweb sites and other applications that are enabled for this operation. Inaddition, its web-parsing function also allows Anita Query Engine toretrieve web pages from any conventional web site on the Internet andconvert unstructured HTML data into meaningful structured data. It isnot mandatory to make changes to existing web sites to make them workwith Anita State Machine and Web Parser. An example of this would be theoperations performed to pass in a zip code to the Yahoo web site,execute the form to retrieve the results, select and format the results,play relevant information in the form of concatenated speech fragments.In this scenario the Yahoo! web site was not modified to support theoperations nor was it aware that a voice-enabled application was usingits HTML based services.

[0147] f. Anita Profiler (10)

[0148] During each user session, Anita Query Engine transfers relevantinformation to Anita Profiler. Anita Profiler captures and filters thisinformation to build a repository of user preferences, navigationalhistory and usage patterns. Anita Profiler recognizes the phone numberof the incoming caller and can work without any user registration.

[0149] g. Anita Ad Generator/Mixer (9)

[0150] Implements complex algorithms to create an entertainingexperience for the user by mixing advertisements and information in aseamless manner. This algorithm is based on a variety of factors such asuser preferences and usage patterns, advertisers' rules and currentlyactive context.

[0151] h. Anita Prompt Generator (6)

[0152] Converts text phrases to audio prompts. Unlike most othertext-to-speech engine, Anita Prompt Generator implements algorithms togenerate prompts in natural human voice using concatenated speechfragments rather than digitally created voice. However, in cases ofcompletely unstructured text, Anita Prompt Generator uses Text-To-Speechsoftware. This software may be based on Fonix Corporation TTS engine.

[0153] i. Anita Repository (7)

[0154] All the Anita components are meta-data driven. All the datarequired to drive these components is stored in Anita Repository. Thisallows Anita developers to generate new voice applications in a matterof hours by simply adding the necessary meta-data to Anita Repository.This meta-data is stored in the form of relational database tables.

[0155] j. Anita Replication Engine (12)

[0156] Smart replication engine that allows distribution of AnitaRepository information to multiple Anita Servers in a reliable manner.This algorithm uses user preferences and usage patterns to replicateonly the necessary information in order to avoid replication storms. Inaddition to Anita Repository data, Anita Replication Engine alsodistributes and applies software updates to all Anita Servers includingitself.

[0157] k. Anita Manager (13)

[0158] Implements a set of standard interfaces for remotely monitoringand managing Anita Server components. These interfaces are used by AnitaToolbox to remotely monitor and manage Anita Server components.

[0159] Anita Server—Process

[0160] 1. When a user calls, Anita Telephone Interface 1 receives thecall and hands it over to Anita Speech Recognition Engine 2.

[0161] 2. Anita Speech Recognition engine 2 converts spoken utteranceinto text and sends it to Anita Natural Language Engine 3 for furtherprocessing.

[0162] 3. Anita Natural Language Engine 3 interprets Natural Languagetext and sends structured commands to Anita Query Engine 4.

[0163] 4. Anita Query Engine 4 takes into consideration all of thegoverning factors such as user preferences, user context, usage patternsand history to determine an end destination node for the user's request.

[0164] 5. Anita Query Engine 4 generates web queries needed to fulfilluser's request and sends them to the Anita State Machine and Web Parser8.

[0165] 6. Anita State Machine and Web Parser 8 browses the Internet/web11 to retrieve information requested by the user. It parses eachreceived page to convert unstructured text into structured datasets.

[0166] 7. While Anita State Machine and Web Parser 8 is busy retrievingthe requested information, Anita Query Engine 4 asks Anita PromptGenerator 6 to generate context-sensitive voice prompts. It also sends arequest to Anita Profiler to add generated queries to the user'sprofile.

[0167] 8. Anita Prompt Generator 6 asks Anita Ad Generator 9 to create aset of entertaining commercials based on user's preferences and context.

[0168] 9. Anita Ad Generator 9 asks Anita Profiler 10 for the userpreference and usage history data and uses it to select appropriatecommercials.

[0169] 10. Anita Prompt Generator 6 creates an audio stream based oncommercials and web information returned by Anita State Machine and WebParser 8 and sends it to Anita Telephone Interface 12.

[0170] Anita Server—Logical Structure

[0171]FIG. 2 is a schematic representation of the logical internalstructure of Anita Server 120:

[0172] Anita Server 120 consists of three logical servers. These serverscould be implemented on one physical box or multiple physical boxesbased on the size and load at each Anita site. If they are implementedon multiple boxes, all the boxes are connected on a singlehigh-bandwidth LAN segment.

[0173] a. Anita Phone Server (20)

[0174] Anita Phone Server 20 implements computer telephony interfaceusing CTI hardware 21, Anita Telephone Interface 1, Anita SpeechRecognition Engine 2, and Anita Prompt Generator6. It connects to one ormore digital lines to accept telephone calls.

[0175] b. Anita Application Server (30)

[0176] Anita Application Server 30 implements Anita applications usingAnita Natural Language Engine 3, Anita Query Engine 4, Anita StateMachine and Web Parser 8, Anita Profiler 10 and Anita Ad Generator/Mixer9. This server is connected to Internet using high-bandwidth lines. Italso implements smart replication using Anita Replication Engine 13.

[0177] c. Anita Database Server (40)

[0178] Anita Database Server 40 implements Anita Repository 7 database.

[0179] Anita Toolbox

[0180] To complement the features and functions of the Anita Server, theAnita Toolbox (see FIG. 5, 118) provides a comprehensive set of tools tofacilitate business partners and developers to:

[0181] 1) Voice-enable existing web-sites and/or applications

[0182] 2) Build voice-enabled v-applications. This uses the functionlibrary to build state machines that can be executed by the Anita StateMachine and Web Parser

[0183] 3) Remotely monitor and manage multiple Anita Servers

[0184] HeyAnita Infrastructure

[0185]FIG. 3 is a schematic representation of the overall HeyAnitaglobal infrastructure that comprises Anita Servers 120 in variouscountries, cities, and other locales. The Anita Servers 120 communicatewith each other via a network such as the Internet 11. The AnitaReplication Engine 12 in the Anita Servers 120 distributes AnitaRepository 7 information to other Anita Servers 120. Anita MonitoringStations 122 are provided to monitor and manage the interaction betweenthe Anita Servers 120. The Anita Monitoring Stations 122 may be AnitaServers 120 which are configured for monitoring as their primaryfunction. They may be similar to the Anita Managers 13.

[0186] HeyAnita Usage Scenarios

[0187] User Profile

[0188] User is registered with the system. She wants HeyAnita torecognize her based on the caller id of her phone without asking for anypassword/pin. She wants to travel to New York city and wants the systemto help her make arrangements. She prefers using Expedia.com for Travelreservations and has a New York office in Manhattan.

[0189] Feature: Free Form Query

[0190] Users can use plain language utterances to navigate the systemand interact with an automated attendant (“Assistant”) at HeyAnita.

[0191] Example 1: Weather

[0192] Assistant: How can I help you?

[0193] User: What is the weather forecast?

[0194] Assistant: Weather in your home city of Los Angeles is sunny andseventy degrees with a high expected to be seventy-six and a lowexpected to be sixty. Would you like the extended forecast for LosAngeles?

[0195] User: What's the weather like in New York?

[0196] Assistant: It's sunny and seventy degrees with a high expected tobe seventy-six and a low expected to be sixty. Would you like theextended forecast for New York?

[0197] Example 2: Buying a CD

[0198] Assistant: How can I help you?

[0199] User: I want to buy the new Guns and Roses CD

[0200] Assistant: Please wait while I find the cheapest price for you.CD-now has it for eleven dollars and ten cents. Would you like to buy itnow?

[0201] Example 3: Stock Quotes

[0202] Assistant: How can I help you?

[0203] User: I want to know the stock price for Microsoft

[0204] Assistant: Microsoft is currently trading at eighty-eightdollars.

[0205] Feature: Clarification Questions

[0206] If the user's query does not narrow the result set down to onedestination then clarification questions are asked.

[0207] Example 1: Weather

[0208] Assistant: How can I help you?

[0209] User: Please tell the traffic conditions

[0210] Assistant: Please say the freeway and direction that you areinterested in

[0211] User: 4 oh 5 South

[0212] Assistant: 405 South is clear from Northridge to Lax, then slowsdown to Long Beach, then is clear to Irvine

[0213] Example 2: Buying a CD

[0214] Assistant: How can I help you?

[0215] User: I want to buy CDs

[0216] Assistant: Would you like to buy CDs from Amazon, CD Now or findthe cheapest price.

[0217] User: I like Amazon.

[0218] Assistant: Please tell me the name of the CD or the artist . . .

[0219] Example 3: Post Office Directions

[0220] Assistant: How can I help you?

[0221] User: Post Office

[0222] Assistant: Say stamps to buy stamps, say directions to get thedirections to the post office or say shipping to get shipping status forparcels at the post office

[0223] User: I want directions.

[0224] Assistant: The nearest post office is . . .

[0225] Example 4: Travel to New York

[0226] Assistant: How can I help you?

[0227] User: I want to fly to New York tonight

[0228] Assistant: Please wait while I find the best fare using Expedia .. . . The fare to New York is $356. Would you like to buy it

[0229] User: Please do and book me a hotel near my office.

[0230] Assistant: Your preferred hotel Sheraton in Manhattan is $227.Would you like to book it

[0231] User: Please do and also get me a rental car.

[0232] Assistant: You are all set. Thanks for using HeyAnita

[0233] Upon Arrival in New York

[0234] User: I need directions to the hotel.

[0235] HeyAnita Recognizes that the Call Originates from a JFK AirportPhone Number

[0236] Assistant: Directions to your hotel in Manhattan.

[0237] Feature: Organized Catalog

[0238] The way in which data is added and stored is also importantcreating a navigable application via the Anita Prompt Generator 6.Information is organized in a “tree” structure 140 as shown in FIG. 4.FIG. 4 demonstrates the organized tree of information which helps toshow how the clarification questions would be asked while narrowing downthe search.

[0239] Unlike with the Internet, the creator of a VRU can plan andcontrol the creation and growth of this tree so that it remains usable.

[0240] Feature: Self-Discovering Features

[0241] While traveling down through the tree the user can discover thefunctions and features of the nodes below.

[0242] Each parent node describes the set of features in the child node.

[0243] Examples:

[0244] Shopping=Buy Books, Buy Electronics

[0245] Buy Electronics=Buy CD Players, Buy VCRs

[0246] News=Headlines, Weather, Financial Sports

[0247] Sports=Football, Basketball, Soccer

[0248] Football=Football Headlines, Football Scores, Football Odds

[0249] Football Headlines=ESPN Football Headlines, CBS FootballHeadlines

[0250] Feature: Context Sensitive Results

[0251] It is important to point out how this tree concept also givescontext to the search as well. For example, if the user just said“Amazon” from the context of the main menu then the user would be askedif they wanted to “buy books from Amazon” or to “buy CDs from Amazon”but if the user said the same thing from the context of the bookssub-tree then they would be taken directly to the section where they canbuy books from Amazon.

[0252] Feature: User Preferences

[0253] HeyAnita is a learning system. It keeps on accumulatinginformation about how users interact with it and modifies its searchmechanism based on users' navigational history and preferences.

[0254] Example: If it finds that a particular user always buys booksfrom Amazon, it will take him directly to “Buy Books from Amazon” whenhe says, “Buy Books”

[0255] While the invention has been described with respect to thedescribed embodiments in accordance therewith, it will be apparent tothose skilled in the art that various modifications and improvements maybe made without departing from the scope and spirit of the invention.For example, the inventive concepts herein may be applied to wired orwireless telephony or other audio and voice access systems, based on theInternet, IP network, or other network technologies and protocols, forinformational or other applications, without departing from the scopeand spirit of the present invention. Accordingly, it is to be understoodthat the invention is not to be limited by the specific illustratedembodiments, but only by the scope of the appended claims.

1. An interactive audio response system that permits users to accessinformation that is not originally formatted for voice interfacing to aninformation exchange network, comprising: a voice interface for user toinput request for information; a speech recognition engine that convertsuser's spoken utterance from the voice interface into text; a naturallanguage engine that interprets the meaning and context embodied in theconverted text and output structured commands; a query engine that, inresponse to the structured commands, determines an end destination nodefor the user's request and generates corresponding web queries; a webparser that, in response to the web queries, browses the web to retrieveinformation requested by user, and parses each received page from theweb to convert unstructured text into structured datasets; and a promptgenerator that generates context-sensitive voice prompts to the voiceinterface in the event that an end destination node cannot be determinedby the query engine.
 2. A system as in claim 1, further comprising: aprofiler that stores user preferences and query history data from thequery engine; an ad generator that, in response to the prompt generator,generates a set of commercials based on user's preferences and contextwhich was retrieved via the web parser.
 3. A system as in claims 1 or 2,wherein the prompt generator generates voice prompts in accordance witha hierarchy tree structure.
 4. An interactive system as in any one ofclaims 1 to 3, wherein the voice interface is a telephony interface. 5.An interactive system as in any one of claims 1 to 4, wherein theinformation exchange network is the Internet.
 6. An interactive systemas in any one of claim 1 to 5, wherein the system is based on anoperating system comprising: speech objects; speech object COM++ DLLs;an agent (OLE DB); and a framework of plug-and-play COM+ components tofacilitate rapid development and deployment of voice applicationswithout reformatting information not originally formatted for voiceinterfacing.
 7. An interactive system as in claim 6, wherein theframework comprises: basic components for basic building blocks forconstructing a voice application; data-bound components that implementsstandardized voice interface on top of commonly used data elements; andvalue-added components that provides value added features of the voiceinterface.